Addressing
Much more important than memorizing a bunch of rarely used instructions is learning the various ways that memory can be addressed. Registers can only hold a few words of data. To handle more data, random access memory (RAM) must be used.
The DECOUT example stored data as part of an instruction (see DECOUT.ASM). This is called immediate-mode addressing:
mov ax, 12345
Immediate mode is useful when the data doesn't change, or is constant, but we often want to deal with values that can change. These values are called variables, and here's how they're coded:
number dw 6502 ;Define a Word of data
. . .
mov ax, [number] ;get what's stored in "number"
call decout ;display its value
"Number" is the label for the memory location that contains the value 6502. Two bytes are needed to hold a value this large. These two bytes form a "word". The PC uses the convention of storing the low byte first. For example, if we convert 6502 to hex, we get 1966h. The 66h will be stored in the first (lower address) byte, and the 19h will be in the second byte.
This is called the "little endian" convention. It's what Intel uses. Motorola, on the other hand, uses the "big endian" convention, which stores the high byte first. This colorful term is from "Gulliver's Travels". The Big-Endians were a group of people who opposed the Emperor's decree that eggs should be broken at the smaller end before they were eaten.
The brackets around [number] are optional in MASM; however this leads to confusion, and other assemblers, such as NASM, require them. The problem is that without brackets there is no distinction shown as to whether the contents of a variable location are being fetched or an immediate-mode constant is being fetched. This is because "number" can be defined two ways: "number dw 6502" or "number equ 6502".
Another quirk of MASM is its use of colons with labels. It insists on having one on a label for an instruction, and it can't handle it if you put one on a label for data (such as "number: dw 6502"). Other assemblers, such as NASM and TASM, aren't this finicky.
If you're having trouble getting your code to assemble correctly, don't be too hard on yourself. Experiment with alternative ways of coding things. MASM is a complex and bizarre assembler, and it doesn't always do what's logical.
Sometimes we need to deal with not just one, but many numbers in a table or an array. Pointers and indexes can be used to access these numbers. Here's how the BP register is used as a pointer to access the data in "table":
table dw 1, 10, 100, 1000, 10000 ;define words of data
tblEnd equ $ ;"$" = current memory address
mov bp, offset table ;point bp at table
ex10: mov ax, [bp] ;get word pointed to by bp
call decout ;display it
add bp, 2 ;point to next word
cmp bp, tblEnd ;is pointer at end of table?
jne ex10 ;loop back if not
We can accomplish the same thing as above by using BP as an index, instead of a pointer:
mov bp, 0 ;initialize index to start of table
ex20: mov ax, [bp+table] ;get word indexed by bp
call decout ;display it
add bp, 2 ;index to next word
cmp bp, 10 ;is index at end of table?
jne ex20 ;loop back if not
We can even have two index registers. (Actually one is called a "base" register. In fact if you want to get carried away, you can think of the data segment register as a third index.) Here's how two index registers might be used:
mov si, 2*4 ;select 3rd name from tbl
call showName
. . .
tbl db 'Adok'
db 'Bonz'
db 'claw'
;Display the name indexed by si
showName:
mov bp, 0 ;initialize index
sn20: mov al, [bp+si+tbl] ;fetch character
int 29h ;display it
inc bp ;next character
cmp bp, 4 ;loop for 4 characters
jne sn20
ret
Memory can be addressed with almost any combination of:
seg_reg:[index+base+offset] (offset = displacement)
These addressing modes are powerful, but there are many restrictions on which registers can be used. Most of these restrictions go away in 32-bit mode, but in 16-bit mode only four registers can be used for indexing, and only certain pairs of them can be used for double indexing. The legal addressing modes are listed on the second page of PCASM.TXT.
Although they're not usually shown, remember that the segment registers are always involved in accessing memory. When data memory is accessed, the data segment (DS) register is normally, but not always, used. As we saw in the 2PLUS3 example, the DS register can be overridden by specifying another segment register:
MOV ES:[6], AL
A small warning: When the BP register is used to access data, the default segment register is not DS, but instead it's the stack segment (SS) register. For the COM files used in this article, this makes little difference because DOS sets SS = DS when it starts the program. However if you use an EXE file, be aware that SS is not normally the same as DS.